With the increase of violent incidents in public and private places, it brings and urgent need for the development of advanced and efficient surveillance systems. Traditional methods, rely on manual monitoring, which are impractical and inefficient for violence detection in complex scenarios. This paper presents a lightweight deep learning framework for real-time violence detection, using knowledge distillation to improve public safety and security. We propose a teacher-student model approach, where a large, pre-trained VGG16 model serves as the teacher to transfer knowledge to a significantly smaller, custom-built CNN student model. The methodology involves training the teacher model on a Kaggle based dataset of violent and non-violent images, followed by training the student model using a combined distillation loss function that balances hard and soft targets. Our results demonstrate that the teacher model achieves a high accuracy of 90.96%. The student model, with a remarkable 7.20x reduction in parameters, achieves an accuracy of 83.85%, successfully retaining over 92% of the teacher’s performance. This framework offers a convincing trade-off between model size and accuracy, making it an effective, efficient and scalable solution for real-time deployment on mobile devices in smart city surveillance systems.
Introduction
The rapid growth of smart cities has heightened the need for advanced public safety solutions, particularly real-time violence detection in video surveillance. Manual monitoring of extensive video data is impractical, leading to research on deep learning methods like CNNs and hybrid models for violence detection. While high-performing models such as VGG16 deliver accurate results, their large size and computational demands limit their deployment on resource-constrained edge devices.
To address this, the study proposes a knowledge distillation framework that transfers knowledge from a large, pre-trained teacher model (VGG16) to a smaller, custom CNN student model. This approach significantly reduces the model size while retaining most of the teacher’s performance, making it suitable for real-time smart city applications.
The student model, with 7.2 times fewer parameters, achieved an accuracy of 83.85%, retaining over 92% of the teacher’s accuracy (90.96%) on violence detection. The framework demonstrates effective model compression and competitive accuracy, highlighting its potential for efficient deployment in resource-limited smart surveillance systems.
The paper includes a thorough literature review, methodology detailing the teacher-student training with custom distillation loss, and results confirming the success of the approach in balancing accuracy and model efficiency.
Conclusion
This research successfully demonstrates the application of a knowledge distillation framework to develop an efficient, lightweight, and accurate real-time violence detection system. The results confirm that the teacher model, while larger and more complex, achieved a high level of accuracy at 90.96%. However, the core finding is the success of the knowledge distillation process. The student model, with a notable 7.20x reduction in the number of parameters, achieved an accuracy of 83.85%, retaining over 92% of the performance of the teacher model. In conclusion, this study shows knowledge distillation as an effective and efficient technique for model compression, offering a practical solution to build real-time surveillance systems.
References
[1] K. R. Krishna, V. S. S. Vishak, and V. C. V. Vyshnavi, “Real time violence detection,” Journal of Science and Technology, vol. 9, no. 4, pp. 1–5, Apr. 2024.
[2] H. A. H. Baca, F. d. L. P. Valdivia, and J. C. G. Caceres, “Efficient human violence recognition for surveillance in real time,” Sensors, vol. 24, no. 2, p. 668, Jan. 2024, doi: 10.3390/s24020668.
[3] A. A. S. A. Arun, S. S. M. R. Sri Skandha, K. Esha, and N. Nathiya, “Human violence detection using deep learning techniques,” in Journal of Physics: Conference Series, vol. 2318, no. 1, p. 012003. IOP Publishing, 2022, doi: 10.1088/1742-6596/2318/1/012003.
[4] A. Verma, “Real-Time Violence Detection in Surveillance Streams,” SSRN, New Delhi, India, 2024.
[5] P. Negre et al., “Literature review of deep-learning-based detection of violence in video,” Sensors, vol. 24, no. 12, p. 4016, Jun. 2024, doi: 10.3390/s24124016.
[6] G. Sudeepthi, R. V. A. Reddy, T. Vaishanvi, and C. Swapna, “Smart surveillance for violence detection,” International Journal for Multidisciplinary Research (IJFMR), vol. 6, no. 6, Nov.-Dec. 2024.
[7] E. Veltmeijer, M. Franken, and C. Gerritsen, “Real-time violence detection and localization through subgroup analysis,” Multimedia Tools and Applications, vol. 84, pp. 3793–3807, May 2024, doi: 10.1007/s11042-024-19144-5.
[8] H. Khan et al., “Violence detection from industrial surveillance videos using deep learning,” IEEE Access, vol. 13, pp. 15363-15375, 2025, doi: 10.1109/ACCESS.2025.3531213.
[9] S. A. Sumon et al., “Violence detection by pretrained modules with different deep learning approaches,” Vietnam Journal of Computer Science, vol. 7, no. 1, pp. 19-40, 2020, doi: 10.1142/S2196888820500013.
[10] M. R. Khan et al., “Multimodal deep learning for violence detection: VGGish and MobileViT integration with knowledge distillation on Jetson Nano,” IEEE Open Journal of the Communications Society, vol. 6, 2025, doi: 10.1109/OJCOMS.2024.3520703.
[11] R. R. Ojha, H. Chawdary, and S. Saraswat, “Enhancing public safety: Real-time violence detection and notification system,” Procedia Computer Science, vol. 258, pp. 2988–2995, 2025, doi: 10.1016/j.procs.2025.04.558.
[12] M. Ahsan, “Real-time violence detection in smart cities using lightweight spatiotemporal deep learning models,” Journal of Artificial Intelligence and Metaheuristics (JAIM), vol. 9, no. 2, pp. 19-36, 2025, doi: 10.54216/JAIM.090202.
[13] M. A. Soeleman, C. Supriyanto, D. P. Prabowo, and P. N. Andono, \"Video violence detection using LSTM and transformer networks through grid search-based hyperparameters optimization,\" International Journal of Safety and Security Engineering, vol. 12, no. 5, pp. 615–622, Nov. 2022, doi: 10.18280/ijsse.120510.
[14] S. A. A. Akash, et al., \"Human violence detection using deep learning techniques,\" Journal of Physics: Conference Series, vol. 2318, no. 1, p. 012003, 2022, doi: 10.1088/1742-6596/2318/1/012003.